首页> 外文OA文献 >Setting the threshold for high throughput detectors: A mathematical approach for ensembles of dynamic, heterogeneous, probabilistic anomaly detectors
【2h】

Setting the threshold for high throughput detectors: A mathematical approach for ensembles of dynamic, heterogeneous, probabilistic anomaly detectors

机译:设置高通量检测器的阈值:数学   动态,异构,概率异常集合的方法   探测器

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Anomaly detection (AD) has garnered ample attention in security research, assuch algorithms complement existing signature-based methods but promisedetection of never-before-seen attacks. Cyber operations manage a high volumeof heterogeneous log data; hence, AD in such operations involves multiple(e.g., per IP, per data type) ensembles of detectors modeling heterogeneouscharacteristics (e.g., rate, size, type) often with adaptive online modelsproducing alerts in near real time. Because of high data volume, setting thethreshold for each detector in such a system is an essential yet underdevelopedconfiguration issue that, if slightly mistuned, can leave the system useless,either producing a myriad of alerts and flooding downstream systems, or givingnone. In this work, we build on the foundations of Ferragut et al. to provide aset of rigorous results for understanding the relationship between thresholdvalues and alert quantities, and we propose an algorithm for setting thethreshold in practice. Specifically, we give an algorithm for setting thethreshold of multiple, heterogeneous, possibly dynamic detectors completely apriori, in principle. Indeed, if the underlying distribution of the incomingdata is known (closely estimated), the algorithm provides provably manageablethresholds. If the distribution is unknown (e.g., has changed over time) ouranalysis reveals how the model distribution differs from the actualdistribution, indicating a period of model refitting is necessary. We provideempirical experiments showing the efficacy of the capability by regulating thealert rate of a system with $\approx$2,500 adaptive detectors scoring over 1.5Mevents in 5 hours. Further, we demonstrate on the real network data anddetection framework of Harshaw et al. the alternative case, showing how theinability to regulate alerts indicates the detection model is a bad fit to thedata.
机译:异常检测(AD)在安全性研究中引起了广泛的关注,因为此类算法补充了现有的基于签名的方法,但有望检测到前所未有的攻击。网络运营管理大量的异构日志数据;因此,在这种操作中,AD涉及多个(例如,每个IP,每个数据类型)检测器模型,这些检测器模型通常以自适应在线模型生成异类特征(例如速率,大小,类型),并以近乎实时的方式生成警报。由于数据量大,因此在这样的系统中为每个检测器设置阈值是必不可少但尚未开发的配置问题,如果稍加疏忽,可能会使系统无用,从而产生大量警报并淹没下游系统,或者一无所获。在这项工作中,我们以Ferragut等人的基础为基础。为了提供一组严格的结果以了解阈值和警报量之间的关系,我们提出了一种在实践中设置阈值的算法。具体来说,我们给出了一种原则上完全设置多个异质,可能是动态检测器的阈值的算法。实际上,如果传入数据的基础分布是已知的(精确估计),则该算法将提供可证明的可管理阈值。如果分布未知(例如随时间变化),我们的分析将揭示模型分布与实际分布有何不同,这表明需要进行一段时间的模型拟合。我们提供的经验性实验显示了通过在$约$ 2,500的自适应检测器在5小时内得分超过1.5Mevent来调节系统的警报率,可以证明该功能的功效。此外,我们在Harshaw等人的真实网络数据和检测框架上进行了演示。另一种情况是,显示无法调节警报如何表明检测模型不适合数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号